{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MiniRocket\n",
    "\n",
    "MiniRocket transforms input time series using a small, fixed set of convolutional kernels.  MiniRocket uses PPV pooling to compute a single feature for each of the resulting feature maps (i.e., the proportion of positive values). The transformed features are used to train a linear classifier.\n",
    "\n",
    "Dempster A, Schmidt DF, Webb GI (2020) MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Series Classification [arXiv:2012.08791](https://arxiv.org/abs/2012.08791)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1 Univariate Time Series\n",
    "\n",
    "### 1.1 Imports\n",
    "\n",
    "Import example data, MiniRocket, `RidgeClassifierCV` (scikit-learn), and NumPy.\n",
    "\n",
    "**Note**: MiniRocket and MiniRocketMultivariate are compiled by Numba on import.  The compiled functions are cached, so this should only happen once (i.e., the first time you import MiniRocket or MiniRocketMultivariate)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:03.214929Z",
     "iopub.status.busy": "2020-10-12T17:43:03.214184Z",
     "iopub.status.idle": "2020-10-12T17:43:03.216304Z",
     "shell.execute_reply": "2020-10-12T17:43:03.216990Z"
    }
   },
   "outputs": [],
   "source": [
    "# !pip install --upgrade numba"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.linear_model import RidgeClassifierCV\n",
    "from sklearn.pipeline import make_pipeline\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "from sktime.datasets import (\n",
    "    load_arrow_head,  # univariate dataset\n",
    "    load_basic_motions,  # multivariate dataset\n",
    "    load_japanese_vowels,  # multivariate dataset with unequal length\n",
    ")\n",
    "from sktime.transformations.panel.rocket import (\n",
    "    MiniRocket,\n",
    "    MiniRocketMultivariate,\n",
    "    MiniRocketMultivariateVariable,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Load the Training Data\n",
    "\n",
    "For more details on the data set, see the [univariate time series classification notebook](https://github.com/sktime/sktime/blob/main/examples/02_classification.ipynb).\n",
    "\n",
    "**Note**: Input time series must be *at least* of length 9.  Pad shorter time series using, e.g., `PaddingTransformer` (`sktime.transformers.panel.padder`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:08.743652Z",
     "iopub.status.busy": "2020-10-12T17:43:08.741410Z",
     "iopub.status.idle": "2020-10-12T17:43:08.749009Z",
     "shell.execute_reply": "2020-10-12T17:43:08.749629Z"
    }
   },
   "outputs": [],
   "source": [
    "X_train, y_train = load_arrow_head(split=\"train\", return_X_y=True)\n",
    "# visualize the first univariate time series\n",
    "X_train.iloc[0, 0].plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 Initialise MiniRocket and Transform the Training Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:08.753121Z",
     "iopub.status.busy": "2020-10-12T17:43:08.752621Z",
     "iopub.status.idle": "2020-10-12T17:43:08.941014Z",
     "shell.execute_reply": "2020-10-12T17:43:08.941496Z"
    }
   },
   "outputs": [],
   "source": [
    "minirocket = MiniRocket()  # by default, MiniRocket uses ~10_000 kernels\n",
    "minirocket.fit(X_train)\n",
    "X_train_transform = minirocket.transform(X_train)\n",
    "# test shape of transformed training data -> (n_instances, 9_996)\n",
    "X_train_transform.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Fit a Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We suggest using `RidgeClassifierCV` (scikit-learn) for smaller datasets (fewer than ~10,000 training examples), and using logistic regression trained using stochastic gradient descent for larger datasets.\n",
    "\n",
    "**Note**: For larger datasets, this means integrating MiniRocket with stochastic gradient descent such that the transform is performed per minibatch, *not* simply substituting `RidgeClassifierCV` for, e.g., `LogisticRegression`.\n",
    "\n",
    "**Note**: While the input time-series of MiniRocket is unscaled, the output features of MiniRocket may need to be adjusted for following models. E.g. for `RidgeClassifierCV`, we scale the features using the sklearn StandardScaler."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:08.993410Z",
     "iopub.status.busy": "2020-10-12T17:43:08.947187Z",
     "iopub.status.idle": "2020-10-12T17:43:09.066548Z",
     "shell.execute_reply": "2020-10-12T17:43:09.067299Z"
    }
   },
   "outputs": [],
   "source": [
    "scaler = StandardScaler(with_mean=False)\n",
    "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n",
    "\n",
    "X_train_scaled_transform = scaler.fit_transform(X_train_transform)\n",
    "classifier.fit(X_train_scaled_transform, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 Load and Transform the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:09.071414Z",
     "iopub.status.busy": "2020-10-12T17:43:09.070666Z",
     "iopub.status.idle": "2020-10-12T17:43:09.931075Z",
     "shell.execute_reply": "2020-10-12T17:43:09.931598Z"
    }
   },
   "outputs": [],
   "source": [
    "X_test, y_test = load_arrow_head(split=\"test\", return_X_y=True)\n",
    "X_test_transform = minirocket.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 Classify the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:09.935232Z",
     "iopub.status.busy": "2020-10-12T17:43:09.934675Z",
     "iopub.status.idle": "2020-10-12T17:43:10.031071Z",
     "shell.execute_reply": "2020-10-12T17:43:10.031624Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "X_test_scaled_transform = scaler.transform(X_test_transform)\n",
    "classifier.score(X_test_scaled_transform, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## 2 Multivariate Time Series\n",
    "\n",
    "We can use the multivariate version of MiniRocket for multivariate time series input.\n",
    "\n",
    "### 2.1 Imports\n",
    "\n",
    "Import MiniRocketMultivariate.\n",
    "\n",
    "**Note**: MiniRocketMultivariate compiles via Numba on import."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Load the Training Data\n",
    "\n",
    "**Note**: Input time series must be *at least* of length 9.  Pad shorter time series using, e.g., `PaddingTransformer` (`sktime.transformers.panel.padder`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:10.054652Z",
     "iopub.status.busy": "2020-10-12T17:43:10.034190Z",
     "iopub.status.idle": "2020-10-12T17:43:10.394311Z",
     "shell.execute_reply": "2020-10-12T17:43:10.394905Z"
    }
   },
   "outputs": [],
   "source": [
    "X_train, y_train = load_basic_motions(split=\"train\", return_X_y=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Initialise MiniRocket and Transform the Training Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:10.410718Z",
     "iopub.status.busy": "2020-10-12T17:43:10.410103Z",
     "iopub.status.idle": "2020-10-12T17:43:11.186318Z",
     "shell.execute_reply": "2020-10-12T17:43:11.186801Z"
    }
   },
   "outputs": [],
   "source": [
    "minirocket_multi = MiniRocketMultivariate()\n",
    "minirocket_multi.fit(X_train)\n",
    "X_train_transform = minirocket_multi.transform(X_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 Fit a Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:11.190556Z",
     "iopub.status.busy": "2020-10-12T17:43:11.190017Z",
     "iopub.status.idle": "2020-10-12T17:43:11.396461Z",
     "shell.execute_reply": "2020-10-12T17:43:11.397135Z"
    }
   },
   "outputs": [],
   "source": [
    "scaler = StandardScaler(with_mean=False)\n",
    "X_train_scaled_transform = scaler.fit_transform(X_train_transform)\n",
    "\n",
    "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n",
    "classifier.fit(X_train_scaled_transform, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 Load and Transform the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:11.401025Z",
     "iopub.status.busy": "2020-10-12T17:43:11.400273Z",
     "iopub.status.idle": "2020-10-12T17:43:12.450777Z",
     "shell.execute_reply": "2020-10-12T17:43:12.451162Z"
    }
   },
   "outputs": [],
   "source": [
    "X_test, y_test = load_basic_motions(split=\"test\", return_X_y=True)\n",
    "X_test_transform = minirocket_multi.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.6 Classify the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.494679Z",
     "iopub.status.busy": "2020-10-12T17:43:12.453795Z",
     "iopub.status.idle": "2020-10-12T17:43:12.548017Z",
     "shell.execute_reply": "2020-10-12T17:43:12.548575Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "X_test_scaled_transform = scaler.transform(X_test_transform)\n",
    "classifier.score(X_test_scaled_transform, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## 3 Pipeline Example\n",
    "\n",
    "We can use MiniRocket together with `RidgeClassifierCV` (or another classifier) in a pipeline.  We can then use the pipeline like a self-contained classifier, with a single call to `fit`, and without having to separately transform the data, etc.\n",
    "\n",
    "### 3.1 Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# (above)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Initialise the Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.552186Z",
     "iopub.status.busy": "2020-10-12T17:43:12.551660Z",
     "iopub.status.idle": "2020-10-12T17:43:12.553415Z",
     "shell.execute_reply": "2020-10-12T17:43:12.553966Z"
    }
   },
   "outputs": [],
   "source": [
    "minirocket_pipeline = make_pipeline(\n",
    "    MiniRocket(),\n",
    "    StandardScaler(with_mean=False),\n",
    "    RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Load and Fit the Training Data\n",
    "\n",
    "**Note**: Input time series must be *at least* of length 9.  Pad shorter time series using, e.g., `PaddingTransformer` (`sktime.transformers.panel.padder`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.557100Z",
     "iopub.status.busy": "2020-10-12T17:43:12.556478Z",
     "iopub.status.idle": "2020-10-12T17:43:12.885951Z",
     "shell.execute_reply": "2020-10-12T17:43:12.886625Z"
    }
   },
   "outputs": [],
   "source": [
    "X_train, y_train = load_arrow_head(split=\"train\")\n",
    "\n",
    "# it is necessary to pass y_train to the pipeline\n",
    "# y_train is not used for the transform, but it is used by the classifier\n",
    "minirocket_pipeline.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4 Load and Classify the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.890535Z",
     "iopub.status.busy": "2020-10-12T17:43:12.889866Z",
     "iopub.status.idle": "2020-10-12T17:43:13.897048Z",
     "shell.execute_reply": "2020-10-12T17:43:13.897624Z"
    }
   },
   "outputs": [],
   "source": [
    "X_test, y_test = load_arrow_head(split=\"test\")\n",
    "\n",
    "minirocket_pipeline.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## 4 Pipeline Example with MiniRocketMultivariateVariable and unequal length time-series data\n",
    "\n",
    "For a further pipeline, we use the extended version of MiniRocket, the  `MiniRocketMultivariateVariable` for variable / unequal length time series data. Following the code implementation of the original paper of miniRocket, we combine it with `RidgeClassifierCV` in a sklearn pipeline.  We can then use the pipeline like a self-contained classifier, with a single call to `fit`, and without having to separately transform the data, etc.\n",
    "\n",
    "\n",
    "### 4.1 Load japanese_vowels as unequal length dataset\n",
    "Japanese vowels is a a UCI Archive dataset. 9 Japanese-male speakers were recorded saying the vowels 'a' and 'e'. \n",
    "The raw recordings are preprocessed to get a 12-dimensional (multivariate) classification problem. The series lengths are between 7 and 29. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_jv, y_train_jv = load_japanese_vowels(split=\"train\", return_X_y=True)\n",
    "# lets visualize the first three voice recordings with dimension 0-11\n",
    "\n",
    "print(\"number of samples training: \", X_train_jv.shape[0])\n",
    "print(\"series length of recoding 0, dimension 5: \", X_train_jv.iloc[0, 5].shape)\n",
    "print(\"series length of recoding 1, dimension 5: \", X_train_jv.iloc[1, 0].shape)\n",
    "\n",
    "X_train_jv.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# additional visualizations\n",
    "number_example = 153\n",
    "for i in range(12):\n",
    "    X_train_jv.loc[number_example, f\"dim_{i}\"].plot()\n",
    "print(\"Speaker ID: \", y_train_jv[number_example])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 Create a pipeline, train on it\n",
    "As before, we create a sklearn pipeline. \n",
    "MiniRocketMultivariateVariable requires a minimum series length of 9, where missing values are padded up to a length of 9, with the value \"-10.0\".\n",
    "Afterwards a scaler and a RidgeClassifierCV are added.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "minirocket_mv_var_pipeline = make_pipeline(\n",
    "    MiniRocketMultivariateVariable(\n",
    "        pad_value_short_series=-10.0, random_state=42, max_dilations_per_kernel=16\n",
    "    ),\n",
    "    StandardScaler(with_mean=False),\n",
    "    RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),\n",
    ")\n",
    "print(minirocket_mv_var_pipeline)\n",
    "\n",
    "minirocket_mv_var_pipeline.fit(X_train_jv, y_train_jv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3 Score the Pipeline on japanese vowels\n",
    "\n",
    "Using the MiniRocketMultivariateVariable, we are able to process also process slightly larger input series than at train time.\n",
    "train max series length: 27, test max series length 29"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test_jv, y_test_jv = load_japanese_vowels(split=\"test\", return_X_y=True)\n",
    "\n",
    "minirocket_mv_var_pipeline.score(X_test_jv, y_test_jv)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.10.6 ('env_sktime')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "4a026b07681b07f232f4be37689469f2f19c060d7d208c4f2bde2ef874c4e7ae"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
